72 research outputs found

    SSMap: A new UniProt-PDB mapping resource for the curation of structural-related information in the UniProt/Swiss-Prot Knowledgebase

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Sequences and structures provide valuable complementary information on protein features and functions. However, it is not always straightforward for users to gather information concurrently from the sequence and structure levels. The UniProt knowledgebase (UniProtKB) strives to help users on this undertaking by providing complete cross-references to Protein Data Bank (PDB) as well as coherent feature annotation using available structural information. In this study, SSMap – a new UniProt-PDB residue-residue level mapping – was generated. The primary objective of this mapping is not only to facilitate the two tasks mentioned above, but also to palliate a number of shortcomings of existent mappings. SSMap is the first isoform sequence-specific mapping resource and is up-to-date for UniProtKB annotation tasks. The method employed by SSMap differs from the other mapping resources in that it stresses on the correct reconstruction of the PDB sequence from structures, and on the correct attribution of a UniProtKB entry to each PDB chain by using a series of post-processing steps.</p> <p>Results</p> <p>SSMap was compared to other existing mapping resources in terms of the correctness of the attribution of PDB chains to UniProtKB entries, and of the quality of the pairwise alignments supporting the residue-residue mapping. It was found that SSMap shared about 80% of the mappings with other mapping sources. New and alternative mappings proposed by SSMap were mostly good as assessed by manual verification of data subsets. As for local pairwise alignments, it was shown that major discrepancies (both in terms of alignment lengths and boundaries), when present, were often due to differences in methodologies used for the mappings.</p> <p>Conclusion</p> <p>SSMap provides an independent, good quality UniProt-PDB mapping. The systematic comparison conducted in this study allows the further identification of general problems in UniProt-PDB mappings so that both the coverage and the quality of the mappings can be systematically improved for the benefit of the scientific community. SSMap mapping is currently used to provide PDB cross-references in UniProtKB.</p

    The protein structure initiative structural genomics knowledgebase

    Get PDF
    The Protein Structure Initiative Structural Genomics Knowledgebase (PSI SGKB, http://kb.psi-structuralgenomics.org) has been created to turn the products of the PSI structural genomics effort into knowledge that can be used by the biological research community to understand living systems and disease. This resource provides central access to structures in the Protein Data Bank (PDB), along with functional annotations, associated homology models, worldwide protein target tracking information, available protocols and the potential to obtain DNA materials for many of the targets. It also offers the ability to search all of the structural and methodological publications and the innovative technologies that were catalyzed by the PSI's high-throughput research efforts. In collaboration with the Nature Publishing Group, the PSI SGKB provides a research library, editorials about new research advances, news and an events calendar to present a broader view of structural biology and structural genomics. By making these resources freely available, the PSI SGKB serves as a bridge to connect the structural biology and the greater biomedical communities

    DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs

    Get PDF
    DrugBank (http://www.drugbank.ca) is a richly annotated database of drug and drug target information. It contains extensive data on the nomenclature, ontology, chemistry, structure, function, action, pharmacology, pharmacokinetics, metabolism and pharmaceutical properties of both small molecule and large molecule (biotech) drugs. It also contains comprehensive information on the target diseases, proteins, genes and organisms on which these drugs act. First released in 2006, DrugBank has become widely used by pharmacists, medicinal chemists, pharmaceutical researchers, clinicians, educators and the general public. Since its last update in 2008, DrugBank has been greatly expanded through the addition of new drugs, new targets and the inclusion of more than 40 new data fields per drug entry (a 40% increase in data ‘depth’). These data field additions include illustrated drug-action pathways, drug transporter data, drug metabolite data, pharmacogenomic data, adverse drug response data, ADMET data, pharmacokinetic data, computed property data and chemical classification data. DrugBank 3.0 also offers expanded database links, improved search tools for drug–drug and food–drug interaction, new resources for querying and viewing drug pathways and hundreds of new drug entries with detailed patent, pricing and manufacturer data. These additions have been complemented by enhancements to the quality and quantity of existing data, particularly with regard to drug target, drug description and drug action data. DrugBank 3.0 represents the result of 2 years of manual annotation work aimed at making the database much more useful for a wide range of ‘omics’ (i.e. pharmacogenomic, pharmacoproteomic, pharmacometabolomic and even pharmacoeconomic) applications

    SchistoDB: a Schistosoma mansoni genome resource

    Get PDF
    SchistoDB (http://schistoDB.net/) is a genomic database for the parasitic organism Schistosoma mansoni, one of the major causative agents of schistosomiasis worldwide. It currently incorporates sequences and annotation for S. mansoni in a single user-friendly database. Several genomic scale analyses are available as well as ESTs, oligonucleotides, metabolic pathways and drugs. In this article, we describe the data sets and its analyses, how to query the database and tools available in the website

    The Significance of the ProtDeform Score for Structure Prediction and Alignment

    Get PDF
    Background: When a researcher uses a program to align two proteins and gets a score, one of her main concerns is how often the program gives a similar score to pairs that are or are not in the same fold. This issue was analysed in detail recently for the program TM-align with its associated TM-score. It was shown that because the TM-score is length independent, it allows a P-value and a hit probability to be defined depending only on the score. Also, it was found that the TM-scores of gapless alignments closely follow an Extreme Value Distribution (EVD). The program ProtDeform for structural protein alignment was developed recently and is characterised by the ability to propose different transformations of different protein regions. Our goal is to analyse its associated score to allow a researcher to have objective reasons to prefer one aligner over another, and carry out a better interpretation of the output. Results: The study on the ProtDeform score reveals that it is length independent in a wider score range than TM-scores and that PD-scores of gapless (random) alignments also approximately follow an EVD. On the CASP8 predictions, PD-scores and TM-scores, with respect to native structures, are highly correlated (0.95), and show that around a fifth of the predictions have a quality as low as 99.5 % of the random scores. Using the Gold Standard benchmark, ProtDeform has lower probabilities of error than TM-align both at a similar speed. The analysis is extended to homology discrimination showing that, again, ProtDeform offers higher hit probabilities than TM-align. Finally, we suggest using three different P-value

    The EMBRACE web service collection

    Get PDF
    The EMBRACE (European Model for Bioinformatics Research and Community Education) web service collection is the culmination of a 5-year project that set out to investigate issues involved in developing and deploying web services for use in the life sciences. The project concluded that in order for web services to achieve widespread adoption, standards must be defined for the choice of web service technology, for semantically annotating both service function and the data exchanged, and a mechanism for discovering services must be provided. Building on this, the project developed: EDAM, an ontology for describing life science web services; BioXSD, a schema for exchanging data between services; and a centralized registry (http://www.embraceregistry.net) that collects together around 1000 services developed by the consortium partners. This article presents the current status of the collection and its associated recommendations and standards definitions

    BSSF: a fingerprint based ultrafast binding site similarity search and function analysis server

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome sequencing and post-genomics projects such as structural genomics are extending the frontier of the study of sequence-structure-function relationship of genes and their products. Although many sequence/structure-based methods have been devised with the aim of deciphering this delicate relationship, there still remain large gaps in this fundamental problem, which continuously drives researchers to develop novel methods to extract relevant information from sequences and structures and to infer the functions of newly identified genes by genomics technology.</p> <p>Results</p> <p>Here we present an ultrafast method, named BSSF(Binding Site Similarity & Function), which enables researchers to conduct similarity searches in a comprehensive three-dimensional binding site database extracted from PDB structures. This method utilizes a fingerprint representation of the binding site and a validated statistical Z-score function scheme to judge the similarity between the query and database items, even if their similarities are only constrained in a sub-pocket. This fingerprint based similarity measurement was also validated on a known binding site dataset by comparing with geometric hashing, which is a standard 3D similarity method. The comparison clearly demonstrated the utility of this ultrafast method. After conducting the database searching, the hit list is further analyzed to provide basic statistical information about the occurrences of Gene Ontology terms and Enzyme Commission numbers, which may benefit researchers by helping them to design further experiments to study the query proteins.</p> <p>Conclusions</p> <p>This ultrafast web-based system will not only help researchers interested in drug design and structural genomics to identify similar binding sites, but also assist them by providing further analysis of hit list from database searching.</p

    BSSF: a fingerprint based ultrafast binding site similarity search and function analysis server

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome sequencing and post-genomics projects such as structural genomics are extending the frontier of the study of sequence-structure-function relationship of genes and their products. Although many sequence/structure-based methods have been devised with the aim of deciphering this delicate relationship, there still remain large gaps in this fundamental problem, which continuously drives researchers to develop novel methods to extract relevant information from sequences and structures and to infer the functions of newly identified genes by genomics technology.</p> <p>Results</p> <p>Here we present an ultrafast method, named BSSF(Binding Site Similarity & Function), which enables researchers to conduct similarity searches in a comprehensive three-dimensional binding site database extracted from PDB structures. This method utilizes a fingerprint representation of the binding site and a validated statistical Z-score function scheme to judge the similarity between the query and database items, even if their similarities are only constrained in a sub-pocket. This fingerprint based similarity measurement was also validated on a known binding site dataset by comparing with geometric hashing, which is a standard 3D similarity method. The comparison clearly demonstrated the utility of this ultrafast method. After conducting the database searching, the hit list is further analyzed to provide basic statistical information about the occurrences of Gene Ontology terms and Enzyme Commission numbers, which may benefit researchers by helping them to design further experiments to study the query proteins.</p> <p>Conclusions</p> <p>This ultrafast web-based system will not only help researchers interested in drug design and structural genomics to identify similar binding sites, but also assist them by providing further analysis of hit list from database searching.</p

    RASOnD - A comprehensive resource and search tool for RAS superfamily oncogenes from various species

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Ras superfamily plays an important role in the control of cell signalling and division. Mutations in the Ras genes convert them into active oncogenes. The Ras oncogenes form a major thrust of global cancer research as they are involved in the development and progression of tumors. This has resulted in the exponential growth of data on Ras superfamily across different public databases and in literature. However, no dedicated public resource is currently available for data mining and analysis on this family. The present database was developed to facilitate straightforward accession, retrieval and analysis of information available on Ras oncogenes from one particular site.</p> <p>Description</p> <p>We have developed the RAS Oncogene Database (RASOnD) as a comprehensive knowledgebase that provides integrated and curated information on a single platform for oncogenes of Ras superfamily. RASOnD encompasses exhaustive genomics and proteomics data existing across diverse publicly accessible databases. This resource presently includes overall 199,046 entries from 101 different species. It provides a search tool to generate information about their nucleotide and amino acid sequences, single nucleotide polymorphisms, chromosome positions, orthologies, motifs, structures, related pathways and associated diseases. We have implemented a number of user-friendly search interfaces and sequence analysis tools. At present the user can (i) browse the data (ii) search any field through a simple or advance search interface and (iii) perform a BLAST search and subsequently CLUSTALW multiple sequence alignment by selecting sequences of Ras oncogenes. The Generic gene browser, GBrowse, JMOL for structural visualization and TREEVIEW for phylograms have been integrated for clear perception of retrieved data. External links to related databases have been included in RASOnD.</p> <p>Conclusions</p> <p>This database is a resource and search tool dedicated to Ras oncogenes. It has utility to cancer biologists and cell molecular biologists as it is a ready source for research, identification and elucidation of the role of these oncogenes. The data generated can be used for understanding the relationship between the Ras oncogenes and their association with cancer. The database updated monthly is freely accessible online at <url>http://202.141.47.181/rasond/</url> and <url>http://www.aiims.edu/RAS.html</url>.</p
    corecore